Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setNodeNetworkUnavailable option for calico node #9479

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yuzhiquan
Copy link

Description

New feature, add an option setNodeNetworkUnavailable, default is true, control the behavior after calico-node is ready
and the calico node to set NodeNetworkUnavailable = false or not.
Some times, in our cluster, we have noticed that the node is ready and NodeNetworkUnavailable = false. but our network is not ready to accept the requests, so we want to add this option to control this.
And we also can find the options in other network projects like flannel or cilium.

Related issues/PRs

Todos

  • Tests
  • Documentation
  • Release note

Release Note

Add setNodeNetworkUnavailable option for calico node

Reminder for the reviewer

Make sure that this PR has the correct labels and milestone set.

Every PR needs one docs-* label.

  • docs-pr-required: This change requires a change to the documentation that has not been completed yet.
  • docs-completed: This change has all necessary documentation completed.
  • docs-not-required: This change has no user-facing impact and requires no docs.

Every PR needs one release-note-* label.

  • release-note-required: This PR has user-facing changes. Most PRs should have this label.
  • release-note-not-required: This PR has no user-facing changes.

Other optional labels:

  • cherry-pick-candidate: This PR should be cherry-picked to an earlier release. For bug fixes only.
  • needs-operator-pr: This PR is related to install and requires a corresponding change to the operator.

@yuzhiquan yuzhiquan requested a review from a team as a code owner November 15, 2024 03:37
@marvin-tigera marvin-tigera added this to the Calico v3.30.0 milestone Nov 15, 2024
@marvin-tigera marvin-tigera added release-note-required Change has user-facing impact (no matter how small) docs-pr-required Change is not yet documented labels Nov 15, 2024
@CLAassistant
Copy link

CLAassistant commented Nov 15, 2024

CLA assistant check
All committers have signed the CLA.

@yuzhiquan yuzhiquan force-pushed the yzq/add-option-set-node-status branch from c1331e8 to 434bc9b Compare November 15, 2024 03:38
@caseydavenport
Copy link
Member

but our network is not ready to accept the requests, so we want to add this option to control this.

Out of curiosity, could you provide a bit more context for this? Is it Calico that is unable to accept requests or something else in the network?

How do you plan to configure this CLI flag - I don't think there's a way currently to set this when installing e.g., via the operator?

@yuzhiquan
Copy link
Author

Is it Calico that is unable to accept requests or something else in the network?

We believe the root cause is the number of our nodes has achieved the quota of full-mesh mode, so we have plan to enable RR in the future, this pr just wants to give another choice in this condition, we can mark the node as ready when we make sure the network is ready for receive requests.

How do you plan to configure this CLI flag - I don't think there's a way currently to set this when installing e.g., via the operator?

Sorry, I'm not familiar with this part. Do you have any suggestions?

@caseydavenport
Copy link
Member

Sorry, I'm not familiar with this part. Do you have any suggestions?

Yeah, it would need to be plumbed through the operator code here: https://github.com/tigera/operator/blob/master/pkg/render/node.go#L1280-L1310

Likely by introducing a a new API field to the Installation API: https://github.com/tigera/operator/blob/master/api/v1/installation_types.go#L32

e.g.,

spec:
  manageNodeConditions: Disabled

It would likely need to be passed through via an environment variable rather than a CLI flag (since the calico/node container configures each process using env vars typically).

I think it's probably OK to add this as an option.. although I think it's just working around the real problem(s):

  • Sounds like you're using BGP full mesh beyond its intended scale.
  • We don't have good heuristics to determine when the node is available or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs-pr-required Change is not yet documented release-note-required Change has user-facing impact (no matter how small)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants